AITopics | pre-training dataset

Collaborating Authors

pre-training dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

505259756244493872b7709a8a01b536-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 21:26:31 GMT

artificial intelligence, machine learning, top-5 pseudo-target, (15 more...)

Neural Information Processing Systems

Industry:

Leisure & Entertainment (0.70)
Media > Music (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Appendix information on the relationship between our training approach and domain adaptation

Neural Information Processing SystemsApr-24-2026, 21:30:31 GMT

Here we note our problem definition of pre-training is fundamentally different from domain adaptation [S1, S2, S3, S4, S5, S6]1 in order to prevent any confusion between this work and domain adaptation methods. DA applies a model trained on a pre-training dataset (i.e., source dataset) to a different target dataset [21, 42]. In contrast, self-supervised pre-training has four key differences with domain adaptation. In contrast, domain adaptation methods usually restrict pre-training and target datasets to have the same feature space (but possible different distributions), e.g., [S22, S18, S19, S20, S13]. In summary, to support transfer learning across different time series datasets, a pre-training approach needs a capability to capture a generalizable property of time series, one that is shared across different time series datasets regardless of the specific semantic meaning of a time series signal (e.g., ECG, EMG, acceleration, vibration), conditions of data acquisition (e.g., variation across subjects and devices), sampling frequencies, etc. This work develops a self-supervised contrastive pre-training strategy that fulfills these requirements by injecting an appropriate inductive bias (called Time-Frequency Consistency, TF-C, into the model (Sec. Further, we clarify that the term'self-supervised' has different meanings in DA and in pretraining [S23, S24, S25, S26]. The'self-supervised domain adaptation' [S27, S16, S21, S15] or'unsupervised domain adaptation' [S1, S22, S28, S11, S14] means that there are no labels in the target dataset, however that still requires labels in the pre-training dataset. In contrast, 'self-supervised pretraining' [S29, S30, S31] (i.e., the problem studied here, in line with a breadth of existing literature on pre-training) indicates the setting where no labels are available in pre-training. Up to the submission of this manuscript, there is no existing contrastive augmentations in time series' frequency domain. There are two models, CoST [49] and BTSF [50], that involved frequency domain in contrastive learning, however, the proposed TF-C is fundamentally different with them in the following aspects. We take BTSF as an example while the differences also apply to CoST. Problem definitions for both papers are different. Our method is designed to produce generalizable representations that can transfer to a different time series dataset (going from pre-training to a fine-tuning dataset) for the purpose of transfer learning.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Energy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Clip_Dataset__NeurIPS2022_ (10)

Thao Nguyen

Neural Information Processing SystemsFeb-19-2026, 07:23:28 GMT

We find that the performance of the pre-training data varies substantially across distribution shifts, with no single data source dominating.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country: Europe > Poland (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

6a5c23219f401f3efd322579002dbb80-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 05:10:04 GMT

benchmark, swintrack-b-384, tracker, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Neural Priming for Sample-Efficient Adaptation Matthew Wallingford Vivek Ramanujan Alex Fang Aditya Kusupati

Neural Information Processing SystemsFeb-17-2026, 05:02:07 GMT

Presented with class names or unlabeled test samples, Neural Priming enables the model to recall and conditions its parameters on relevant data seen throughout pretraining, thereby priming it for the test distribution. Neural Priming can be performed at inference, even for pretraining datasets as large as LAION-2B. Performing lightweight updates on the recalled data significantly improves accuracy across a variety of distribution shift and transfer learning benchmarks.

large language model, machine learning, neural priming, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)
(3 more...)

Add feedback

ca9567d8ef6b2ea2da0d7eed57b933ee-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 02:46:03 GMT

artificial intelligence, machine learning, pre-training dataset, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

SSL4EO-L: Datasets and Foundation Models for Landsat Imagery Adam J. Stewart

Neural Information Processing SystemsFeb-16-2026, 19:41:06 GMT

The Landsat program is the longest-running Earth observation program in history, with 50+ years of data acquisition by 8 satellites. The multispectral imagery captured by sensors onboard these satellites is critical for a wide range of scientific fields. Despite the increasing popularity of deep learning and remote sensing, the majority of researchers still use decision trees and random forests for Landsat image analysis due to the prevalence of small labeled datasets and lack of foundation models. In this paper, we introduce SSL4EO-L, the first ever dataset designed for Self-Supervised Learning for Earth O bservation for the Landsat family of satellites (including 3 sensors and 2 product levels) and the largest Landsat dataset in history (5M image patches). Additionally, we modernize and re-release the L7 Irish and L8 Biome cloud detection datasets, and introduce the first ML benchmark datasets for Landsats 4-5 TM and Landsat 7 ETM+ SR. Finally, we pre-train the first foundation models for Landsat imagery using SSL4EO-L and evaluate their performance on multiple semantic segmentation tasks.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country: